Introduction
‘Take some more tea,’ the March Hare said to Alice, very earnestly.
‘I’ve had nothing yet,’ Alice replied in an offended tone, ‘so I can’t take more.’
‘You mean you can’t take LESS,’ said the Hatter: ‘it’s very easy to take MORE than nothing.’
Animals live in complex environments that change constantly, both in the short-term as well as the long-term. The ability to recognize and respond appropriately to such change is crucial to an animal’s survival. Cognitive or behavioural abilities that have evolved to enable such responses are widespread, and animals can demonstrate them even in the simplified environment of a lab experiment. One such experimental paradigm that requires an animal to learn and modify its behaviour in response to environmental change is the seemingly-simple one of reversal learning.
In a reversal learning experiment an animal is first faced with a simultaneous choice between two stimuli, only one of which is paired with a reward. After a certain number of trials has occurred and the animal has likely learned the association between the correct stimulus and reward, the reward contingencies of the two stimuli are reversed. In a serial reversal learning procedure the reward contingencies reverse repeatedly. An animal that chooses the rewarded stimulus more frequently than the non-rewarded stimulus is judged to perform better on the task; a choice to the non-rewarded stimulus is, in the context of the task, an ‘error.’ The serial reversal learning protocol can be adapted to the behaviour and sensory physiology of many different species: using visual stimuli in bumblebees (Strang and Sherry (2014)) and guppies (Boussard et al. (2020)); visual and spatial stimuli in corvids (Bond, Kamil, and Balda (2007)); spatial stimuli in rats (Boulougouris, Dalley, and Robbins (2007)), great tits (Hermer et al. (2018)) and gray squirrels (Chow et al. (2015)); and olfactory stimuli in rats (Kinoshita et al. (2008)). This adaptability has allowed for illuminating comparative research on the cognitive and behavioural capabilities of different species.
Comparative research on reversal learning, specifically serial reversal, has been used as an explicit measure of animal intelligence. Bitterman (Bitterman (1964)) argued that performance on this task might be indicative of a qualitative difference in intelligence between animals that were ‘lower’ and ‘higher’ in the evolutionary ‘hierarchy’: pigeons, rats and monkeys showed a progressive improvement in performance, whereas turtles and fish did not. The hierarchical ranking of animals in this way, using ‘intelligence’ or any other criterion, is scientifically meaningless - animal species are not more or less evolved than each other, but branching lineages arising from common ancestors. Moreover, ‘intelligence’ in this sense, defined in terms of reasoning, planning and rule-learning is a very anthropomorphic framing of the concept.
‘Intelligence’ is what we might infer animals to have, based on their behaviour. Learning, by contrast, is demonstrable; it is something that is definitely known to happen in reversal learning, and it is therefore a more meaningful criterion when comparing the performance of different animals. First-order learning happens when an animal learns the stimulus-reward association and changes its behaviour accordingly. Discrimination learning requires animals to learn a specific, arbitrary response to each of multiple stimuli; reversal learning is essentially a specific type of discrimination learning. Because the same stimuli are successively paired with a reward and then not paired with a reward, animals can, in principle, develop strategies in their response behaviour. Higher-order or second-order learning is the learning of rules or strategies. The rule that leads to optimal performance in a reversal learning task is ‘win-stay; lose-shift,’ which means in practice one ‘error’ per reversal. After learning the task, a perfectly rational animal will first exclusively choose the stimulus that is paired with reward. At the first choice of this stimulus that does not give a reward (the error), the animal will change its preference and exclusively choose the other stimulus which is now paired with a reward. Progressive ‘improvement’ in this task, where an animal makes fewer and fewer errors per reversal is indication that the animal is learning the rule of reversal, or ‘learning to learn’ (Shettleworth, 2010).
Comparative research from the primate literature reveals a very interesting difference in these two types of learning (first- and second-order). Rumbaugh and colleagues (Rumbaugh, Savage-Rumbaugh, and Washburn (1996)) compared 13 different species of primates on a visual reversal learning task. The animals were first trained to discriminate between a pair of stimuli to either one of two performance criteria: 67% or 84% choice for the rewarded option, and then given a single reversal of reward contingencies. If the primates’ behaviour was driven mainly by the strength of reinforcement (first-order learning), they should make fewer errors after the reversal when the criterion was 67%. If second-order learning or rule-learning was occurring, there would be fewer errors when trained to 84%. The results showed that Prosimian species tended to perform better when trained to 67%; apes when trained to 84%; and monkeys were intermediate.
Contrary to the hierarchy once presumed among animals, progressive improvement on the task, or evidence of rule-learning is not restricted to apes. On the serial reversal learning task bumblebees show progressive improvement (Strang and Sherry (2014)) in their performance with each reversal. (Bees also show evidence of proactive interference however: their performance decreases in the last few trials of the experiment.) Guppies show progressive improvement as well as an increase in their probability of success on a visual serial reversal task. (Boussard et al. (2020)). Such progressive improvement with serial reversals has also been seen in several species of birds, including two different species of great tits (Hermer et al. (2018)) and three species of corvids (Bond, Kamil, and Balda (2007)). Corvid species not only show improvement but significant transfer across stimulus modality: strong evidence of rule-learning.
What performance on the serial reversal task says about the deeper cognitive mechanisms at work is not a settled question. There is an important distinction between cognitive and behavioural flexibility, and it is important to understand which of these the serial reversal learning task, indeed any reversal learning task, is actually measuring. Cognitive flexibility, or cognitive changes in response to changes in internal state or the external environment cannot be directly observed. It is inferred to have occurred through changes in behaviour, or behavioural flexibility (Tait et al. (2018)). However, just because behavioural flexibility is observed does not necessarily indicate cognitive flexibility. As Dhawan et al.,(Dhawan, Tait, and Brown (2019)) showed, the reversal learning task (not the serial reversal learning task) is evidence of behavioural flexibility, but not cognitive flexibility per se.
The term ‘behavioural flexibility’ itself has been used widely to describe many different traits and behaviours. This has decreased, rather than added to the clarified the usage of the term, as many of these traits are non-equivalent, and are in fact controlled by different neural mechanisms (Audet and Lefebvre (2017)). The point, however, is that the ability to respond behaviourally to change, whether that change is internally driven (for example, a change in hunger, metabolic state, or satiety) or externally motivated (for example, a change in resource availability) is an adaptive advantage. Ultimately, the serial reversal learning task requires an animal to recognise and respond to a change in the rewarding status or the availability of resources. The way an animal’s foraging environment typically changes exerts selective pressure on its behaviour and behaviour evolves in response. The kind of behavioural flexibility that is required to deal with seasonal changes in fruit availability is not the same as the kind of flexibility required to deal with capturing a prey animal intent on escaping.
Animals that feed on flower nectar are faced with an environment where resources that are rewarding at certain times are not rewarding at others. Flowers can replenish their nectar-supply, the nectar can be tapped and emptied by other nectar-feeding animals, and this can happen again and again. There is a sense in which the foraging environment of a nectar-feeding animal is a natural analogue of the serial reversal task, potentially selecting for the ability to exploit the environment in an energy-efficient way. Certainly bumblebees are capable of performing well at the serial reversal task; our research was focused on another nectar-feeding animal, with certain key differences in its foraging ecology compared to bees.
Commissaris’s long-tongued bat, Glossophaga commissarisi, is a Glossophagine bat species from Central and South America. Unlike bees, these bats forage for themselves instead of a colony, and they have remarkably high metabolic rates for their body mass (C. C. Voigt and Winter (1999); Christian C. Voigt, Kelm, and Visser (2006)). This is due to the energetic demands of hovering flight (Y. Winter and Helversen (1998); O. v. Helversen and Reyer (1984)), and they rely primarily on flower nectar to meet this demand (what’s a good citation for this?). The sugars from the nectar completely fuel the bats’ high metabolism directly with very little conversion to or storage as fat tissue (Kelm et al. (2011); C. C. Voigt and Speakman (2007)). As flowers yield only small droplets of nectar for each visit made, (about 193 J of energy per visit: Christian C. Voigt, Kelm, and Visser (2006)), the bats need to make several hundred flower visits per night to satisfy their energetic requirements. Efficiency is, therefore, critical to the bats’ foraging behaviour.
The bats’ foraging ecology is shaped by the importance of flower-nectar as a resource. Plants fed on by bats rely on them for the crucial function of pollination. The flowers have conspicuous olfactory, echo-acoustic and visual cues to attract and enable the bats to find the flowers (D. von Helversen and Helversen (1999); York Winter, Merten, and Kleindienst (2005)). Many bat-pollinated plants put out only a few flowers at a time that bloom for a long time (Kunz and Fenton (2005)). The nectar itself is a self-replenishing resource, meaning the same flower can be profitably exploited at multiple time-points. Thus, the bats rely primarily on their excellent spatial memory to relocate a profitable flower (Y. Winter (2005); Toelch et al. (2008)), and this memory can last upto a few weeks after the initial learning (Rose et al. (2016)). Estimating when to come back is another important aspect of the bats’ cognitive process. To be worth the energetic cost of the visit, the flower’s nectar levels need to be sufficiently replenished, which takes a certain amount of time. However, if a bat waits too long to return, competing conspecifics can deplete the refilled flower. Repeated visits to a flower therefore requires the bat to both remember its location and estimate its expected reward value. If a bat visits a flower and finds it full of nectar, the optimal strategy is to exploit it fully before a competitor can find it; leave the flower in search of others when it is empty; remember the location of the flower; and return to it when sufficient time has passed for it to refill, but not so long that a competitor can find it.
A reversal learning task requires an animal to respond to a change in the profitability of the options available to it, and remember all the rewarding options it has experienced in its environment: tasks that nectar-feeding bats perform many times in a typical foraging bout. We carried out a serial reversal learning task with wild G. commissarisi individuals. Over three nights the bats were given two potentially rewarding options to choose between. At the start of the night, only one of the options was rewarding and the other was not. The two options were at unique spatial locations at a distance sufficient for the bats to detect them as different. After a certain number of visits had been made by the bats, the reward contingencies reversed without any cue: the previously rewarding option was now unrewarding and the previously unrewarding option was rewarding. This reversal occurred five times a night on every night.
Our aims with this experiment were as follows. Firstly, we wanted to demonstrate that the bats were capable of reversal learning. We believed this to be extremely likely as the behavioural requirements of the task are typical features of the animals’ foraging ecology. Secondly, if the bats demonstrated the ability to respond to the reversals, we wanted to explore how this is reflected in their decision-making. What changes are seen in the relative number of visits made to the rewarding and the non-rewarding options? Thirdly, we wanted to see if the bats were capable of second-order learning, or ‘learning to learn.’ Could the bats learn the rule behind the change in their environment and use the optimal strategy of one error per reversal?
We found that the bats learned the task very quickly, showing an extremely high overall preference for the rewarding option. With increasing experiences of reversals the bats switch their preference to the more rewarding option more and more rapidly after experiencing a reversal. They also show an overall increase in the percentage of visits they make to the rewarding option, with each successive night and with each successive reversal. This is an effect that decreases with each successive reversal, very likely due to a ‘ceiling effect’: their preference is too high to allow much further increase. After the analyses described above were done and the data and results examined, we performed further analyses to explore the conclusions of our confirmatory analyses. The difference between these results must be clearly noted.
Firstly, we reasoned that one important component affecting the bats’ increasing preference for the rewarding option was the difference between the first visits of a night, before any experience of a reversal, and all the subsequent visits after at least one reversal had occurred. The exploratory analysis revealed that there is indeed a differential effect: bats make a larger proportion of their visits to the rewarding option before any experience of a reversal on a given night. Secondly, we wanted to explore the effect of the asymptotic level of performance (the highest stable proportion of visits to the rewarding option after a reversal) on the performance immediately after a reversal. Did a higher asymptote before a reversal lead to a higher performance immediately after the reversal, reminiscent of reversal learning in apes, or did it lead to a lower performance after the reversal, reminiscent of reversal learning in Prosimian primates? We found a large interaction between the asymptote and the experimental night, but an overall negative effect of the asymptote: the higher the proportion of rewarded visits before a reversal, the more ‘difficult’ it was for the bats to reverse their preference.
Methods
Study site and subjects
The experiment was done from the 28th of June to the 25th of July, 2017, at La Selva Biological Field Station, Province Heredia, Costa Rica. Male and female individuals of the species Glossophaga commissarisi, were captured from the wild for the experiment. The bats were attracted to a particular location in the forest using sugar-water (see Reward below) as bait and then caught in mist-nets. The bats were sexed and the selected individuals were were then taken to two flight-cages (4 x 6 m). The flight-cages had mesh walls and therefore the same climatic conditions as the surrounding environment. A group of four bats at a time were put into a flight cage. All the individuals in a group were the same sex. The bats were weighed and radio frequency identification (RFID) tags that were uniquely assigned to each bat were placed around their necks as collars. The bats were then released into the flight-cages so they could fly within them freely.
Before the start of the experiment the procedure was tested with four females and refinements were made to the procedure. The data from these individuals were not analyzed. 16 bats participated in the main experiment. At the end of the experiment, the RFID collars were removed and the bats were weighed to make sure they were still at a healthy weight. No blinding was done as all the data collection was completely automatized. Two of the bats did not drink a sufficient amount of sugar-water to meet minimum energy requirements and were released before the end of the experiment and not replaced. The data from these two individuals were not analyzed. Thus, 14 bats in total (seven males and seven females) completed the experiment and the data from these animals were analyzed.
Animal experimental procedures were reviewed and permission for animal experimentation and RFID-tagging was granted by Sistema Nacional de Areas de Conservación (SINAC) at the Ministerio de Ambiente y Energía (MINAE), Costa Rica.
Experimental Setup
Reward
The reward received by the bats during the experiment was also their main source of food. The reward was a 17% by weight solution of sugar dissolved in water (prepared fresh everyday), hereafter referred to as ‘nectar.’ The sugar consisted of a 1:1:1 mass-mixture of sucrose, fructose and dextrose. The nectar was thus similar in composition and concentration to the nectar produced by wild chiropterophilous plants (Baker, Baker, and Hodges 1998).
Flower and pump setup
Each flight cage had a square plastic frame in the center (2x2x1.5m). Eight reward-dispensing devices - hereafter referred to as ‘flowers’ - were fixed in a radial pattern on this frame, two on each side of the square (see Figure 1) with a distance of 40 cm between adjacent flowers. This is a distance the bats can discriminate (Thiele and Winter 2005). Each flower had the following parts: an RFID reader mounted on a plastic cylinder around the head of the flower; an infra-red light-barrier beam; an electronic pinch valve through which a PVC tube was placed and fixed to the head of the flower
A stepper-motor pump was placed in the center of the plastic frame in each cage. The pumps contained a 25 mL Hamilton glass syringe (Sigma Aldrich). The precision of the two pumps differed slightly: the pump in Cage 1 delivered 2.11 \(\mu\)L per step of the stepper-motor, and the pump in Cage 2, 3.33 \(\mu\)L per step. The glass syringe was connected to the tubing system of the flowers through five pinch valves. The pinch valves controlled the flow of liquid from the pump to the system and from a reservoir of liquid to the pump. The reservoir (500 mL thread bottle, Roth, Germany) was filled with fresh nectar everyday and connected to the syringe through the valves.
When a tagged bat approached a flower, the individual RFID number was read by the reader. If the bat then poked its nose into the flower and broke the light barrier, it triggered the release of a reward. The pinch valve opened and the pump moved the correct number of pre-programmed steps to dispense nectar to the head of the flower. The bat could easily hover in front of the flower and lick up the nectar. Only when both events occurred, i.e., the RFID reader detected a bat and the light-barrier was broken, would a reward be triggered.
The flowers and the pump were connected to a Lenovo ThinkPad laptop computer, which ran the experimental programs and the programs used to clean and fill the systems: PhenoSoft Control 16, PhenoSoft GmBH, Berlin, Germany. The raw data were recorded to this computer as comma-separated values (CSV) files.
Experimental procedure
Every day at around 10 h the old nectar was emptied from the system. The system was rinsed and filled with plain water until 15 h , when it was filled again with fresh nectar. Twice a week the system was filled with 70% ethanol for an hour to prevent microbial growth, then repeatedly rinsed with water.
Four bats were placed in a flight-cage in a group, and all the bats were the same sex. There were four such groups in total, and data were collected simultaneously from two groups, one in each flight-cage. Each bat was uniquely assigned two adjacent flowers on the same side of the square frame, out of the array of eight. These flowers were programmed to reward only one of the four bats in the cage. After the system was filled with fresh nectar at approximately 17 h, the program was left running for data-collection till the next morning. Thus, the bats could begin visiting the flowers to collect a reward whenever they chose, which was at approximately 18 h every night.
During the course of the night, when the syringe of the pump had been emptied, the pump re-filled automatically. This event happened only once every night. On the main experimental days this process took 4.5 minutes (SD = ±0.18) for the horizontal pump, and 2.43 minutes (SD = ±0.04) for the vertical pump. About 1 % (SD = ±0.74) of all visits made by the bats over all three experimental nights happened during the pump refill events, and the bats did not receive any reward on these visits, even if they were made to the rewarding flower.
Every night the bats were also given ad-libitum supplemental food: 3.5g of hummingbird food (NektarPlus, Nekton) in 100 mL of water and 3.5g of milk powder (Nido 1+, Nestle) in 100 mL of water. They were also given a small bowl of locally-sourced bee pollen.
Experimental design
The experiment proceeded through the following stages.
Training to use a flower
On the night the naive bats were captured and placed into the flight cages they could receive a reward from any of the flowers whenever they visited them throughout the night. To enable the bats to find the flowers a small cotton pad was placed on the flowers, soaked in di-methyl di-sulphide. This is a chemical attractant produced by many bat-pollinated flowers (O. von Helversen, Winkler, and Bestmann 2000). A small drop of honey was applied to the inside of the flowers to encourage the bats to place their heads inside, break the light-barrier and trigger a nectar reward. By the end of the night all the bats had found the flowers and learned to trigger rewards quickly.
Training to use two specific flowers
After the bats had learned to trigger rewards, the next stage of training involved assigning the bats uniquely to two out of the eight flowers in the array. For an individual animal only the two flowers assigned it would be rewarding from this stage of training until the end of the experiment. Because the bats had already learned to trigger a reward at the flowers, the flowers were not provided with a cotton piece with the chemical attractant and honey was not applied to them. This stage was similar to the previous one, except the bats could only trigger a reward at their assigned flowers.
Alternation
To ensure that the bats were familiar with both flowers assigned to them they went through one final stage of training: forced alternation between the two assigned flowers all night long.
Main Experiment
In this serial reversal learning task the bats had to choose between a flower that gave 40 \(\mu\)L of nectar and one that gave no reward at all. The location of the rewarding flower was not cued, but through the Alternation phase of training each bat knew the locations of both flowers that were potentially rewarding to it. After a bat had made 50 visits in total to the two flowers a reversal occurred: the previously rewarding flower became the non-rewarding flower and vice versa. Importantly, only visits to the two flowers assigned to a bat counted towards the visit tally, not visits to any of the other flowers which were unrewarding to that particular bat. The batch of 50 visits that occurred between two consecutive reversals (when the locations of the rewarding and unrewarding flowers remained stable) was termed a ‘reversal block,’ including the first 50 visits of a night when the bats had not experienced any reversal at all that night. This occurred at regular intervals of 50 visits until the bat either stopped making visits or reached a maximum of 300 visits in a night. After the bat had made 300 rewarded visits it could no longer receive a reward on that experimental night. There were five reversals per night. This stage of the experiment was repeated for three nights in a row. The same flower was the first to be rewarding at the start of every night. Thus, because there were five reversals every night (six blocks of 50 visits), if a bat completed the maximum of 300 visits on a night, the last flower to be rewarding that night was non-rewarding at the start of the next night.
Statistical analysis
All the models were fit in a Bayesian framework using Hamiltonian Monte Carlo in the R package brms (Bürkner 2017), which is a front-end for rstan (Stan Development Team, 2020).
All the visits made by the bats during a night, up to a maximum of 300, were included in the analyses. There were three experimental nights, divided into six blocks of 50 visits each. At the end of the first five blocks a reversal occurred and the end of the last block was the end of data-collection for the night. Each block was further divided into five bins, each consisting of ten visits, in order to examine the bats’ behaviour within each block.
We defined a perseverative visit as a visit to the previously-rewarding option just after the occurrence of a reversal, until the first visit to the newly-rewarding option. By definition this could not happen in the first block of a night. A generalized linear mixed-model was used to investigate the effect of experimental night and reversal block on the number of perseverative visits. A negative-binomial likelihood function was used for this model. Experimental night, reversal block and their interaction were fixed effects and random slopes and intercepts were used to fit regression lines for each individual animal.
We also examined the proportion of visits made to the rewarding flower. This was defined as the ratio of the number of visits to the rewarding flower divided by the total number of visits in a bin. The total number of visits in a bin only included visits made to the two flowers assigned to a bat and any visit to a flower that was not assigned was not considered in the analysis. The model was fit using a binomial likelihood function, with experimental night, block, bin and their interactions as fixed effects; random slopes and intercepts were used to fit regression lines for the individuals.
After examining the above results, further analyses were done. It is important to note that these were exploratory, and the ideas were suggested to us by the results of the intended analyses described above.
A second model was fit to the proportion of visits to the rewarding flower to take into account the fact that the first night and the first block of each night were qualitatively different from the others. On the first night the animals had had no prior experience of any reversals, and during the first block of every night they had not experienced any reversals on that night, and this was reflected in the fit of the posterior predictions made from the first model. The second model of these data was identical to the first except for the addition of experimental night and block as factor variables, with the first night and the first block of every night as one level and the other nights and other blocks of each night as the other level. The two models were compared using leave-one-out cross-validation, implemented in brms using the package loo (Vehtari, Gelman, and Gabry 2017).
We reasoned that a comparison of the bats’ behaviour just before and after a reversal might reveal something of the learning mechanisms at work. If a higher proportion of visits to the rewarding flower just before a reversal is predictive of a higher proportion of visits to the rewarding flower just after, that might potentially indicate that the bats are learning the ‘rule’ behind the reversals. On the other hand, if there is no rule-learning, and the animals’ choice is driven by how much reinforcement was received at the two options, we would expect the opposite: the proportion of visits to the rewarding flower before the reversal is predictive of a lower proportion of visits to it just after as the animals take longer to ‘reverse’ their choices from a highly reinforced option. We took the proportion of visits to the rewarding option, averaged over the last three bins of a reversal block for each individual as the ‘asymptote’ of the bats’ choice behaviour. We fit a generalized linear-mixed model with the proportion of visits to the rewarding option the first bin just after a reversal as the response variable with the fixed effects asymptote, a continuous variable, and night, a factor variable. Random slopes and intercepts were used to fit regression lines for each individual animal.
Weakly informative priors were used. The random intercepts and slopes were given a Normal distribution with a mean of 0, and a standard deviation drawn from a Cauchy distribution with a mean of 0 and a standard deviation of 1. All the models were estimated using 4 chains with a thinning interval of 3, with 1200 warm-up samples and 1300 post-warm-up samples for the model with the first experimental night and block additionally treated differently; 2000 warm-up samples and 2000 post-warm-up samples for the model of the first bin of 10 visits after a reversal; and 1000 warm-up samples and 1000 post-warm-up samples for the others.
Visual inspection of the trace plots, the number of effective samples, the Gelman-Rubin convergence diagnostic (\(\hat R\)) and the calculation of posterior predictions for the same clusters were all used to assess the fit of the models. In all of the models the \(\hat R\) was equal to 1 for all the chains.
The data from all 14 bats that participated in the three experimental nights were included in these models, even though some individuals did not complete all 300 visits on every single night.
All statistical analyses and creation of plots were done in R.
Data availability
All data and analysis code are available online at …..